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29 ' Abstract 
O " 

o 

■ Many recent works that study the performance of multi-input multi-output (MIMO) systems in practice 

assume a Kronecker model where the variances of the channel entries, upon decomposition on to the 
transmit and the receive eigen-bases, admit a separable form. Measurement campaigns, however, show 
CO , that the Kronecker model results in poor estimates for capacity. Motivated by these observations, a 

channel model that does not impose a separable structure has been recently proposed and shown to fit 
the capacity of measured channels better In this work, we show that this recently proposed modeling 



_S : framework can be viewed as a natural consequence of channel decomposition on to its canonical 

coordinates, the transmit and/or the receive eigen-bases. Using tools from random matrix theory, we then 
^ ■ establish the theoretical basis behind the Kronecker mismatch at the low- and the high-SNR extremes: 

' 1) Sparsity of the dominant statistical degrees of freedom (DoF) in the true channel at the low-SNR 



X 



extreme, and 2) Non-regularity of the sparsity structure (disparities in the distribution of the DoF across 



m 
o 
p . 

QQ , the rows and the columns) at the high-SNR extreme. 

o 

OO 

. Index Terms 



Correlation, fading channels, information rates, MIMO systems, multiplexing, random matrix theory. 



^ , sparse systems. 



I. Introduction 

Under the assumption of spatially independent and identically distributed (i.i.d.) Rayleigh 
fading between antenna pairs, multi-input multi-output (MIMO) systems achieve a linear growth 

V. Raghavan is with the Coordinated Science Laboratory and the Department of Electrical and Computer Engineering, 
University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA. J. H. Kotecha is with Freescale Semiconductor Inc., 
Austin, TX 78721 USA, and A. M. Sayeed is with the Department of Electrical and Computer Engineering, University of 
Wisconsin-Madison, Madison, WI 53706 USA. Email: vasanthan_raghavan@ieee.org, jayeshkotecha@freescale.com, 
akbar@engr.wisc.edu. 'Corresponding author. This research was supported in part by NSF Grant #CCF-0431088 through the 
University of Wisconsin. 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 



2 



in multiplexing gain and coherent capacity with the number of antennas [1], [2]. However, 
the rich scattering assumption is idealistic and most physical channels encountered in practice 
exhibit clustered scattering and spatially correlated links [3]-[5]. Correlated MIMO channels 
have been theoretically studied mainly in the contexts of the separable correlation model (also 
known as the Kronecker model) [3], [4], and the virtual representation framework for uniform 
linear arrays (ULAs) [6]-[9]. The Kronecker model assumes separability in correlation induced 
by the transmitter and the receiver arrays which limits the degrees of freedom (DoF) in modeling 
the channel. Though this model has been shown to be accurate in certain settings (especially 
2x2 scenarios) [10]-[14], the separability assumption limits its applicability to more realistic 
settings where the gains accrued with MIMO make it a viable choice. The virtual representation 
does not assume such separability, but is applicable only for ULAs. 
Contributions: 

• i) In this paper, we develop a unified statistical modeling framework for Rayleigh fading 
MIMO channels based on decomposition of the channel matrix on its canonical coordinates, 
the transmit and/or the receive eigen-bases. Motivated by virtual representation, these 
models do not assume separable statistics and are applicable to general array geometries. 
Like the Kronecker model, the eigen-modes of the scattering environment decide the transmit 
and the receive eigen-bases whereas the canonical channel matrix embodies the statistically 
independent DoF that govern channel capacity and diversity. 

Depending on the covariance structure of the channel, three models arise in which all 
the columns/rows/channel entries are uncorrelated. The last caseQ, denoted here as the 
canonical model (or CMS for short), has been proposed as the Weichselberger model in [18] 
and studied from a capacity analysis viewpoint in [19]. The new contribution in this work is 
the unified development of CMS as a natural consequence of two other models, denoted as 
CMl and CM2. The development of CMl (and CM2) critically depends on two assumptions 
about the covariance and the cross-covariance information of the rows (and the columns) of 
the channel matrix. We establish the criticality of these four assumptions in the development 
of CMS. To the best of our knowledge, CMl and CM2 have not been proposed elsewhere in 
the literature, and could provide useful intermediate models for certain asymmetric MIMO 
systems. 

• ii) Many recent works [18], [20]-[26] have shown that the Kronecker model consistently 
estimates the capacity of a large class of measured channels poorly and hence they establish 
the neecl§ for more accurate channel modeling. For example, [18], [20]-[22] show that the 
Kronecker model severely underestimates true capacity whereas under certain conditions, 

'Some of the earlier works of the authors [15], [16] have also suggested this model and developed it independently [17]. 
^In fact, the development of CMS in [18] is motivated by these observations. 
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it could also overestimate the true capacity [24], [26]. Nevertheless, motivated by extensive 
measurement studies, the popular belief is that the "probability of overestimation decreases 
with increasing antenna number" [26, Footnote 5]. The main focus of this work is to 
theoretically explain these observations. 

Towards this goal, we first note that recent measurement campaigns have also observed 
that only a few of the statistical DoF are dominant enough to contribute towards reliable 
communications. That is, measured multi-antenna channels are sparse in the canonical 
domain. Furthermore, the distribution of the sparse DoF across the spatial domain does 
not observe any regularit^ structure. For example, see [26, Figs. 9 and 11], [20], [21], [28] 
etc. which plot the sparse, non-regular structure of the Weichselberger coupling matrix that 
reflects the statistical DoF in CMS. 



Sparse 

and regular^ ^ Non-sparse 

and regular (i.i.d.) 



_ Non-sparse 
and non- regular 
(Near-Kronecker) 



Fig. 1. Partitioning of the space of all possible channels based on sparsity and regularity. 

The foundation for the above experimental evidence lies in theoretical electromagnetic 
studies that explain sparsity of DoF in different contexts in wireless communications [9], 
[29]-[35]. Nevertheless, a simpler communication-theoretic motivation for sparsity is that 
while there may be many channel coefficients whose energy levels are non-zero, they may 
not be strong enough to be estimated accurately at the transmitter, even statistically. It 
becomes impossible or too costly to estimate such coefficients accurately and thus from 
the transmitter's viewpoint, it is reasonable to treat their contributions as noise. Thus, 
we can partition the space of all possible channels into four classes, as in Fig. [U with 
the class of sparse and non-regular channels being the most predominant. In this work, 
we develop a mathematical framework for probabilistically modeling sparse multi-antenna 

'Let He be an Nr x Nt random matrix with independent entries and let the variance of Hc[i,j] be given by Pc[i,j]. A 
channel is called column-regular if Pc[j, j] is equal for all j, row-regular if the above condition is true for H^f, and 

regular if it is both row- and column-regular [27]. Otherwise, it is non-regular. 
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channels. The framework developed here allows us to adjust the average number of dominant 
channel coefficients and theoretically study the impact of a Kronecker model on the capacity 
mismatch. 

• iii) We assume that the statistics of the true channel in the canonical domain has a sparse, 
non-separable structure. Based on recent works (see [36] and references therein) that estab- 
lish the accuracy of a Gaussian approximation to outage capacity, we also assume that the 
ergodic capacity (Cerg(p)) and the variance of capacity (V{p)) are the key figures-of-merit. 
The main results of this work are obtained for the low- and the high-SNR extremes in the 
large-system (antenna) regime, and are summarized in Table IB 



Table I: Summary of Main Results 




Mismatch Metric 


Conclusions 




Cerg,can(p) = C'erg, kron (p) for all Chan. 

v-cnCp) ^ 1 A + 4^ ^here 

{p, fi,a} are sparse model parameters 
(See Theorem 2 and Prop. 2 for details) 


I.I.D. ^ He ^ Sparse 
Decreases <^= Mism. ^> Increases 


p ^ oo 


C'erg, can(p) ^erg, kron (p) ^ 
^ loH^ f pow'AMcoi pow J 

2 \^ GMrow poW GMt-oi po„ J 

where AM, and GM, are arithmetic 
and geometric means of row and 
column powers of the true channel 
(See Theorem 4 for details) 


Regular He =^ Non-regular 
Decreases <^= Mism. ^> Increases 



We show that for almost every sparse channel: a) Using marginal sum statistics to generate 
a Kronecker fit results in an artificial increase in the number of DoF, b) As a result, the 
channel power is spread across the increased DoF, and c) Hence, the Kronecker model 
offers a poor estimate for capacity. Towards establishing this connection, we develop a tight 
approximation for the mean of the log-determinant of random matrices (with independent 
entries) which is of independent interest in MIMO analysis and design. 
In the high-SNR extreme, the Kronecker model underestimates Cerg(p) for all channels. 
The level of underestimation decreases as the channel becomes more regular. In the low- 
SNR extreme, Cerg(p) is the same with either model. The Kronecker model underestimates 
V{p) with the level of underestimation decreasing as the channel becomes less sparse. 
Thus, for a large class of channels that are sparse and non-regular, the Kronecker model 
underestimates the outage capacity at all reliability levels (and also the reliability at all 
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data rates) in the medium- to high-SNR regime. On the other hand, for any channel in the 
low-SNR regime, and regular channels in the medium- to high-SNR regime, the Kronecker 
model overestimates capacity at high levels of reliability (and reliability at low data rates), 
and vice versa. 

Organization: This paper is organized as follows. The canonical statistical modeling framework 
for correlated multi-antenna channels is developed in Section |Il] with the key properties of the 
proposed model elucidated in Section |nil In Section |Wl we explore practical modeling issues 
and show how the canonical and the Kronecker models are used to describe realistic measured 
channels. A brief summary of MIMO capacity issues is provided in Section |V] with a comparative 
study of the two models performed in Section |Vll Conclusions are drawn in Section IVIII 
Notation: We use upper-case and lower-case bold symbols for matrices and vectors, respectively. 
If X is an M X matrix, x = vec(X) denotes the MN x 1 vector obtained by stacking 
columns of X. The entry in the m-th row and n-th column, and the m-th diagonal entry of 
X are denoted by X[m, and X[m] = X[m, m], respectively. The complex conjugate, regular 
transpose and Hermitian transpose of X are denoted by X*, X^ and X^ while its inverse, trace 
and determinant are denoted by X~^, Tr(X) and det(X), respectively. The operators E[-], ® 
and stand for expectation, Kronecker and Hadamard products. The indicator function of a 
set A and its probability are given by x{-^ ^^id Pr(^). We use the standard big-Oh (O) and 
little-oh (o) notations, ~ for equality in distribution, and X ~ CJ\f{fi, o"^) to indicate that X is 
a complex Gaussian random variable with mean fi and variance cr^. 



II. Canonical Modeling of Correlated MIMO Channels 

Consider a narrowband, Rayleigh fading MIMO channel with Nt transmit and A^^ receive 
antennas. The A^ x 1 received vector y is related to the A^ x 1 transmit vector x by 

y = Hx + n (1) 

where H is the A^^ x Nt channel matrix and n is the independent, white Gaussian noise added 
at the receiver. The entries of H are zero mean, complex Gaussian that satisfy 

h = vec(H) ~ C7V(0, R) (2) 

for some positive semi-definite channel covariance matrix R. 

We now describe three canonical decompositions of MIMO channels. Let Qt = E [H^H] 
and Qr — E [HH^] denote the transmit and the receive covariance matrices. Let their respective 
eigen-decompositions be given by Qt = UtAtUf^ and Q,. = U,. ArU^ where the columns of Uf 
and \Jr are eigenvectors of Qt and with the corresponding eigenvalues denoted by diagonal 
entries of and A^. 
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Lemma 1: Any channel matrix H can be written in the canonical form: H = H^U^ such 
that E [Hf Ht] = At and 

ht ^ vec(Ht) = vec(HUt) = (U^(8)I)h~CA/'(0,Rt) (3) 
nt = E [hthf ] = (U^ (8) I) R (U^ ® I)". (4) 

Similarly, any channel matrix H can be written in the canonical form: H = U^Hr such that 
E [H^Hf ] = A^ and 

= vec(H^) ~ CJ\f{Q, R^) (5) 
Rr = E [h,hf ] = (I ® Uf ) R (I ® U,) . (6) 

Furthermore, H can also be written in the canonical form: H = U^HcU^ such that E [H^Hg] = 
At, E [H^Hf ] = A, and 

he ^ vec(He)~CAA(0,Re) (7) 

Re ^ £;[hehf] =(U^®Uf)R(U?^®Uf)^. (8) 
Proof: The proof is immediate by using the relation: vec(ABC) = (C^ ® A) vec(B). ■ 
It is possible to obtain interesting, yet realistic statistical models that allow tractable per- 
formance analysis if we make the following simplifying assumptions. The following notation is 
used: The vectors gj (and hj) denote the i-th (and the j-th) column of (and H), respectively, 
i.e. H = [hi . . . hjvj = [gi-.-givJ''. 



A. Canonical Model 1 fCMl) 

We denote by CMl a channel that follows the following two assumptions. 
Assumption 1: The co variance matrices of all rows of H have the columns of Ut as a set of 
common eigenvectors. That is, E [gjgf^] = UtAf Uf^ for some positive semi-definite diagonal 
matrix AJ*. 

Assumption 2: The cross-covariance matrices of the rows of H also have a set of common 
eigenvectors, given by columns of Ut, i.e. E [gigj^] = U^Aj-'U^ for all i j and 
some diagonal A\\ In general, A^t "^^^^ not be positive semi-definite because E [gjgj^] is 
not Hermitian. 

Then, H can be written as H = H^Uf^ with the following properties: 

• The covariance matrix of each row of (denoted by {gfj}), given by E [gtig^], is 
diagonal. This follows directly from Assumption 1 and the fact that gtj = U^gj. 

• The columns of (denoted by {hfj}) are uncorrelated with each other, i.e. E [hj jh^] = 
for all j, i 7^ j. However, the columns may have arbitrary covariances. This is because 
the (m,n)-th entry of the cross-covariance between the i-th and the j-th columns of is 
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given by E [gtm['^]Stn[j]] which can be seen from Assumptions 1 and 2 to be 6{i — j)A^"^[j] 

with S{-) denoting the Kronecker delta. 
We summarize these conclusions in the form of Lemma |2l 

Lemma 2 (CMlj; Under Assumptions 1 and 2, any channel can be written as H = HjUf^ 
where the columns of are uncorrected with each other. ■ 

B. Canonical Model 2 (CM2j 

We denote by CM 2 a channel which follows the following two assumptions. 
Assumption 3: The covariance matrix of the columns of H, given by E [hjhf^] , have a set of 
common eigenvectors, independent of i. We assume that these eigenvectors are columns of U^. 
Assumption 4: The cross-covariance matrices of columns of H also have a common set of 
eigenvectors, given by columns of Ur-, i.e. E [hihf] = UrA*^U^ for all i j and some 
diagonal A*-' . 

Lemma 3 (CM2): Under Assumptions 3 and 4, any channel can be written as H = UrH^ 
where the rows of are uncorrelated with each other. ■ 
It is clear that CM2 is the dual of CMl. In CMl, the transmitter sees parallel channels in the 
eigen-domain while in CM 2, the receiver sees parallel channels in the eigen-domain. 

C. Canonical Model 3 (CM3j 

We denote a channel which follows Assumptions 1-4 as CMS. This model has been developed 
independently in [17]-[19]. 

Lemma 4 (CMSj.- Under Assumptions 1-4, any channel can be written as H = UrHcUf^ 
where the entries of are uncorrelated, but not necessarily identically distributed. 

Proof: See Appendix \M ■ 
Exploiting the fact that the entries of He are uncorrelated under CMS, the channel covariance 
matrix can be written as 

R= (Uf ®Uf)^Re(Uf ®Uf) (9) 

where Rc is diagonal. First, note that the right-hand side of ^ is an eigen-decomposition of 
R. The matrices Ut and Ur, which are a set of eigenvector matrices of Qt = E [H^H] and 
Qr = E [HH^] , can be interpreted as transmit and receive eigen-matrices, respectively. Clearly, 
CMS is a special case of CMl (and CM 2) where the covariance matrices of the columns (rows) 
of Ut (Ur) have the same eigen-matrix Ut (Ur). In fact, CMS is an intersection of CMl and 
CM 2. Our primary focus in the rest of the paper is on CMS, which we will label as the canonical 
model. We also define the spatial power matrix Pc by the relationship 

p,[z,j]^E[iHc[^,j]r] (10) 

and note that the diagonal entries of Rc correspond to {Pc[i, j]}. Henceforth, we will use this 
alternate characterization of Rc. We now identify some of the key properties of CMS. 
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III. Properties of The Canonical Model 

A. Relation to Other Channel Models 

We show how two well-known channel models, the Kronecker model and the virtual repre- 
sentation framework, can be regarded as special cases of the canonical model. 

1) Kronecker Model: The Kronecker model has been used in [3], [4] and also in many 
recent works under the assumption that the transmitter and the receiver are surrounded by local 
scatterers. This model is verified by measurement campaigns for certain environments in [10]- 
[14]. It assumes separable statistics at the transmitter and the receiver and is given by 

H = Hiid S^', Rfe = ^ [vec(H)vec(H)^] = S< ® (11) 

where the entries of H|jd are i.i.d. CA/'(0, 1), and and S,. are the transmit and the receive 
covariance matrices, respectively. 

Recall from the discussion in Sec. HI] that 

Qt = E [H^H] = U^A^Uf , q,r = E [HH^] = U.A.Uf , (12) 

R = E [vec(H)vec(H)^] , and Tr(R) = Tr(QO = Tr(Q,). (13) 

From ([TT]) . we then have the following relations for the Kronecker model: 

= S^' E [H^d Hiid] S^' = Si Tr(S,), (14) 

= E [Hiid Si H?,] = Tr(Si). (15) 

The fact that Q, and S, are scaled versions of each other implies that eigen-decompositions of S^ 
and S,. are given by \Jt^t,k^f and U.,.A.,.^fcU^ where Xt,k = At/Tr(Sr) and A^.^ = Ar/Tr(S(), 
respectively. Further, 

Tr(Qi) = Tr(Q,) = Tr(S,)Tr(S,) = Tr(R). (16) 
Thus, we can write the channel in (fTTI) in the CMS form as 

rH = K]'^ Uf Hiid Ui A^' Uf = H, Uf (17) 

1 /2 

where r = [Tr(R)] ' . The entries of H^ are uncorrelated with covariance matrix Rc where 

H, = Ay^Uf Hiid U.A^^^-^Ay^ Hiid A^^ (18) 
= E [hehf ] =\t® Ar, K = vec(He). (19) 

The equality in (a) of ([TSl) arises from the invariance of the distribution of Hiid under left and 
right unitary multiplications [1]. 
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The spatial power matrix for the Kronecker model is denoted by P^. The Kronecker structure 
of Rc implies that the i-th column vector of (denoted by Pk,i) is 



k,i 



At[i\- A,[l],A,[2],...,A,[iV, 



T 



(20) 



Note that this is a direct consequence of assuming separable statistics for H and does not hold in 
general. For the general case, separation of the transmit and receive domains can be artificially 
induced by using the marginal sum statistics (see Prop. [I])- This fact highlights the limitations of 
the Kronecker model. The canonical framework results in a richer class of channels since it does 
not assume separability and an arbitrarily diagonal Rc is needed to model a general channel. 

2) ULAs and the Virtual Representation: In [6], a virtual representation framework is proposed 
for systems with ULAs at both the transmitter and the receiver. In this case, H can be written 
as Aj,H„A^ where A^ and A^ are discrete Fourier transform (DFT) matrices. It is argued in 
[6] that the entries of H^, are approximately uncorrected for finite number of antennas and the 
approximation becomes increasingly accurate as antenna dimensions increase. Thus, A^ and A,, 
serve as eigen-matrices in the virtual representation framework: 

E[H^H] = AiAt,,Af, At,, = E [Hf H,] , (21) 
E[HH^] = A,A,,,Af, A,,, = E [H,Hf ] . (22) 

Furthermore, Assumptions 1-4 made in the context of CMS are satisfied by virtual represen- 
tation. An important point to note is that while the transmit and the receive basis in CMS are a 
function of the channel statistics and the entries of the canonical decomposition are exactly 
uncorrelated, the eigen-matrices A^ and A^ are fixed DFT matrices and entries of H^, are 
approximately uncorrelated. Thus, in addition to the fact that the virtual representation for ULAs 
provides an intuitive physical interpretation where the eigenvectors At and A^ are beams in 
fixed virtual directions, it also makes transmit signal design easier since the transmit and the 
receive bases are fixed and do not change with the channel statistics. 

B. Transmit-Receive Eigen-spaces and Their Interaction 

The decomposition of CMS provides an equivalent representation in the eigen domain: 

Yc = HcXc + ric (23) 

where 

Ye ^ Uf y, X, ^ Uf X, and n, ^ Uf n. (24) 

Thus, a linear transformation at the transmitter and the receiver results in He with independent 
entries. We note the following points. 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 



10 



• Joint statistics - He captures the joint transmitter-receiver statistics given by Pc which are 
in general non- separable, in comparison with the separable statistics of P^. 

• Degrees of freedom - Define the DoF available in the channel as the entries of having 
non-zercQ variance. Thus, 

DoF = rank(Rc) = rank(R) < NtNr. (25) 

The i.i.d. channel has DoF = NtNr and all the DoF have equal power. In correlated 
channels, however, the DoF is smaller and these DoF do not have equal power. 

• Parallel channels - The parallel channels in the i.i.d. case have identical statistics and number 
ra.m{Nt,Nr). In correlated channels, the non-zero columns of Pc expose the number of 
available parallel channels which is less than min(A^t, A^^), in general. 

The last two observations signify the key differences in correlated versus i.i.d. channel modeling. 
Since the DoF and parallel channels are unequal, they should be excited appropriately for optimal 
transmission. While the canonical model does not provide the same physical insight as the virtual 
representation (e.g., path partitioning), the mathematical similarities between the two models can 
be exploited. This is witnessed by many recent works that explore the impact of independent 
entries in the case of virtual representation. See e.g., [16] for channel estimation; [7], [8], [19] and 
Sec. |Vl]of this paper for capacity analysis; [37]-[39] for limited feedback system design; [40], 
[41] for non-coherent signal design; [42], [43] for space-time code design etc. 

IV. Statistical Models for Measured Channels 

In this work, we adopt the standard channel power normalization used in the MIMO literature: 
Pc = NfNr, where pc is defined as 

Pc ^ i?[Tr(HH^)] = E[Tr(H,Hf )] = Tr(R) = Tr(R,) = (26) 

A. Fitting Measured Channels with a Kronecker Model 

Even though some initial studies [10]-[14] indicate that the Kronecker model is a good fit for 
2x2 scenarios, further studies [18], [20]-[26] show that a non-separable modeling framework is 
more accurate. A non-separable framework in a Rayleigh fading setting is characterized by NtN^ 
statistical parameters, namely {Pc[i, j]}. Initial difficulties on the tractability of the performance 
analysis of MIMO channels with such a general statistical description has led to the popularity 
of fitting the measured channel with a model characterized by fewer parameters. The following 

''in practice, it is reasonable to define DoF as the number of entries in Pc that are larger than an a phon-determined threshold. 
The term "rank" in i25\ should then be replaced with an appropriate definition of "effective rank." 
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proposition illustrates how a physical channel generated assuming non-separable statistics can 
be fitted with a Kronecker model. 

Proposition 1: Consider a channel under CMS: H = UrHcUf^ with Pc[^, j] = E [|Hc[i, 
A Kronecker fit for H is of the form UrHfcUf where Hfc[i, j] ~ CA/'(0, Pfc[i, j]) with 



(27) 



Furthermore, the mapping in (|271) always increases the DoF in the Kronecker fit for a scattering 
environment described by CMS. 

Proof: Given a channel H that follows CMS, we attempt to fit a channel H that follows 
the Kronecker model to it. From Sec. IIII-A[ the general form of H is \Jrk ^].^k Hiid for 
some appropriate choice of Ut^, Uj.fe, A^fc and A^a,.. By comparing the transmit and the receive 
covariance matrices with the two expansions, it can be checked that Utk = Ut, \]rk = U,. and 
Pa:[^5 j] — -^rkli] ^tk[j] has to Satisfy the relationship in dTT] ). For the second part, note that 



Pk[t,3] > 



(28) 



EkiPclkj] 

and hence, Pfc[i, j] is non-zero if Pc[^, j] is. Thus, the DoF in the Kronecker fit is always larger 
than the actual DoF with the canonical model. ■ 
As an extreme artificial example of the above trend, consider a 4 x 4 system where the CMS 
channel has DoF = 4 and spatial power matrix Pc as in (|29l ) below. It maps to a Kronecker 
model with DoF = 16 and spatial power matrix P^ as below: 



4 * 



10 
10 
10 
1 



(29) 



In general, the Kronecker model spreads the degrees of freedom across the resulting P^ and 
thereby 'flattens ' it since its statistics are based only on column and row sum statistics of the 
actual spatial power matrix. Note that while the transformation from Rc to could lead to a 
change in rank, the transformatior§ from He to H/^ does not. 



B. Modeling Sparsity Mathematically 

Another property suggested by fundamental electromagnetic studies [9], [29]-[35] as well as 
recent measurement campaigns [26, Figs. 9 and 11], [20], [21], [28] is that only a small subset 

^As long as no row or column of Pc (and Pfe following Prop. [T]l are completely zero, the event where He (similarly for Hk) 
is singular is a zero probability event [44]. 
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of the NfNr statistical parameters in CMS are dominant enough to be leveraged towards reliable 
communications over practical SNR ranges. That is, measured wireless channels are sparse. 

In this work, we compare the trends of the canonical and the Kronecker models across a 
large family of correlated/sparse channels. We provide two simple mathematical frameworks to 
generate large families of channel correlation information under CMS and hence, from Prop. [H 
under the Kronecker model. For this, we write Pc[^, j] as 

P,[i,j]=NtNr-^P^ (30) 

where {pij} is a family of NfNr random variables supported on [0, 1] that correspond to 
unnormalizec]^ variances. 

In sparse framework I, we selzl {pij} to be i.i.d. with common mean and variance, and 
cr^, respectively. A typical rich environment (intuitively, a 'near-i.i.d.' environment) is obtained 
by setting cr^ ~ with an i.i.d. channel corresponding to the extreme case of cr^ = 0. As cr^ 
increases, subject to the condition that o"^ < 1 — /i^ (since pij are supported on [0, 1]), {pij} get 
'well-spread out' around p. That is, there exists a large variability in the values of {Pc[2,j]}, 
which intuitively reflects a correlated/sparse setting. 

Despite a precise recipe for modeling in framework I, it could be difficult to systematically 
generate extremely sparse channels (where the fraction of dominant entries vanishes). In such 
settings, we propose sparse framework II in which we set Pij as 

Pij = QiJ^iJ (31) 

where qij is generated as described above (in framework I) and Sij is an i.i.d. family of binary 
(0 or 1)- valued random variables with 

Pr(s,,, = l)=p=l- Pr(si,,- = 0). (32) 

Sparse channels can be generated systematically by adjusting the value of p appropriately. As 
p increases, the channel generated via (|3TI) becomes more richer with the two frameworks 
coinciding for p = 1. 

Note that frameworks I and II provide simple mathematical abstractions to model sparsity 
and their applicability in practice needs to be substantiated with further measurement studies. 
Nevertheless, as we will see, these simple models provide engineering intuition on the trends of 
capacity behavior. 

V. Capacity of Correlated MIMO Channels 

Towards this goal, we now briefly summarize some of the recent works on MIMO capacity. 
Prior to this summary, we state the channel state information (CSI) assumptions of this work. 



*That is, pi,j have to be normalized, as in l|30), to ensure that pc = NtNr- 
^The i.i.d. assumption on {pij} is made to simplify further analysis. 
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A. Channel State Information 

We assume a coherent receiver architecture. That is, the receiver has perfect CSI. This is 
possible in practice by estimating the channel at the receiver using training symbols over a 
dedicated training period that lasts a significant portion of the channel coherence duration. We 
further assume that the statistics of the channel do not change over a reasonably long duration 
so that they can be acquired perfectly at the transmitter. 



B. Ergodic Capacity 

In this setting, the ergodic (or average) capacity at a transmit SNR of p is given by [2] 

Ce.g(p) = sup Eh I log2 det (I + HQH« ) | (33) 



sup En [log2 det (l + HQH^)] 

Q : Q>0, Tr(Q)<p 



where the optimization is over the set of trace-constrained, positive semi-definite matrices. While 
uniform-power (or full rank) signaling is optimal when no CSI is available at the transmitter, it 
is shown in [27], [45], [46] that the optimal Q to solve (|33]) has an eigen-decomposition 



Q 



opt 



UtAoptUf 



(34) 



where \Jt is an eigen-matrix of Qt = E [H^H] and Aopt is a positive semi-definite, diagonal 
matrix obtained via a numerical search. Closed-form solutions for Aopt are not known; however, 
an iterative algorithm has been proposed in [27]. 

For any correlated channel, this algorithm converges to beamforming (or rank-1 signaling) in 
the asymptotically low-SNR regime and uniform-power signaling^ in the asymptotically high- 
SNR regime. Thus, the low-SNR and the high-SNR ergodic capacities (denoted by Ciow(p) and 
Chigh(p), respectively) are given by 



C|ow(p) = C'erg(p) 
Chigh(p) = C'erg(p) 



Qopt as p ^ 



E 



E 



log2 (^l+pJ2\^c[i ; JmaxJ I 

P 



Qopt as p ^ oo 



logo det Im + 



H 



rank(Pc 



H,He 



(35) 



(36) 



where jmax = argmaxj Pc[«, j] corresponds to the dominant transmit eigen-direction. In 
general, at an intermediate SNR, the optimal rank of Q is non-decreasing (as p increases) with 
precise estimates available for the transient-SNR's when a particular rank signaling scheme 
becomes optimal [47]. 



'Without any loss in generality, we assume that no column of Pc has all zero entries. 
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C. Outage Capacity 

It is also well-understood that the ergodic capacity is an insufficient metric to understand the 
fundamental impact fading has on achievable data rates and the notion of outage capacity [2], 
[48] at an outage probability of g % is relevant. The outage capacity is the maximum rate that 
is guaranteed for at least (100 — g)% of the channel realizations and is defined as 



Many recent works have shown that Gaussian approximations to Cout, g(p) with mean and 
variance given by Cerg(p) and V{p), the variance of capacity, are accurate in the large-system 
limit; see [36] and references therein. Thus, in the large-system limit, Cout, g(p) can be efficiently 
approximated as 



error function. 

VI. Comparative Study of Capacity of Kronecker and Canonical Models 

An important point to note from (1381) is that the outage capacity is determined upon knowledge 
of Cerg(p) and V{p). The main focus of this section is thus on understanding Cerg(p) and V{p) 
when a MIMO channel (with non-separable statistics) is fitted with a Kronecker model as in 
Prop. [H We denote by Cerg, can(p) and Cerg, kron(p), the means of capacity under the two models 
and by Kan(p) and Vkron(p), the variances of capacity under these models. 

In this section, we provide good estimates for the above quantities under certain conditions. 
While an analytical understanding of these quantities for all SNRs seems difficult, it is possible 
to obtain engineering intuition by studying the mismatches (between the two capacities) at the 
low- and the high-SNR extremes under a large-system assumption. Since the convergence to the 
large-system regime is typically fast (see e.g., [36] and references therein which point out that 
good agreement is possible even with 4 or 8 antennas) we expect this analysis to be useful in 
making meaningful conclusions in the finite antenna regime. 

A. Low-SNR Extreme 

As noted in Sec. |Vl beamforming to the statistically dominanj^ transmit eigen-mode (which 
is the same irrespective of whether beamforming is done based on the statistics of He or H/,.) is 
optimal from an ergodic capacity perspective in the low-SNR regime. However, many works in 



Cout, q{p) = sup (R) s.t. Pr (log2 det [l + HQH^] < R) < 



R>0 



100 



(37) 




'without loss in generality, let all the Nr entries in the dominant column {Pc[i, jmax], i = 1, • • • , A^'^} be non-zero. 
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the literature define tlie low-SNR regime imprecisely as "p —>■ 0." It is useful to define a transient- 
SNR, Plow, such that beamforming is capacity-optimal if p < piow- Some works (see [8], [49] 
and references therein) define piow implicitly in terms of means of certain random variables that 
are related to Pc, but are nevertheless difficult to compute in closed-form. In [47], using toolE 
from random matrix theory, it is shown that 

Plow ~ p\. . Y (39) 

2^i=l -t^c[^) JmaxJ 

Capacity Computation: We first develop a general low-SNR characterization of MIMO capacity 
in the canonical case, and then leverage this result to the Kronecker case. For this, we definqlll 

Plow, can • 



^ 1 j_ _ viVrE:=i(Pc[^,w])^ 

Plow, can ^-.^p^[,,,_]-^„' ^0 1+ 2E£iPc[.,J. ^ ' ^ ^ 

The importance of piow, can is that /(p), the average mutual information with statistical beam- 
forming, is given by 

I{p) = \0g,{e) -p-Y, Pc[^, Jmax] ■ (1 + 0(1)), P ^ Plow, can ■ 

(41) 

i=l 

It should be noted that Ciow(p) shows the same trends as /(p). This is the content of the following 
theorem. 

Theorem 1: There exist positive constants c, £ > 0,m > 1/2 (all independent of Pc,Nt and 
A^,.) such that 



l0g2(e)-Ml-^- — ) < Ce.g,can(p) < log,{e) ■ 6 



for all p= 0<6<c, (42) 

Z-/j=l cl^ 1 J ma,x\ 



where Kc is defined as 



2 



(43) 



A -I . ^i=l (Pc[^5 Jmax]) 
= 1 + 

and 7o is as in (|40|) . Alternately, the above statement can be recast as 

Cerg, can (p) = l0g2(e) ■ p ■ 5^ Pe[^, Jmax] (1 + o(l)) (44) 

i 

'"Also, see [50] which points this out from a reconfigurable antennas point-of-view. 

"The difference between piow, can and piow, can is that while beamforming is exactly capacity-optimal below pbw.can, it is only 
near-optimal below piow,can- Nevertheless, note that if Pc[i,imax] = 1 for all i, piow,can reduces to and thus the trends of 
Plow, can are similar to that of piow.can- 
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with the 0(1) factor converging to as A^^ ^ 00 and 5^0. 

Proof: See Appendix |Bl ■ 
From (1271 ). it follows that if {Pc[^, jmax]} are non-zero, so are {Pa:[«, jmax]}- It is thus easy 
to specialize Theorem [T] to the Kronecker case (associated with pbw, kron) and compare the two 
results. 

Capacity Comparison: 

Theorem 2: Let the low-SNR regime be defined as p < piow where 

Plow = min (pi 

ow, can; Plow, kron ) ■ (45) 

In this regime, the following conclusions hold for the dominant terms of the capacity quantities. 

• (a) The dominant terms of the ergodic capacity under the two models is the same. In 
particular, we have 

Cerg, can (P) = Cerg, kron (P) = loga (e)p ■ ^ [z, jmax] • (46) 

i 

• (b) The dominant terms of the variances satisfy 



, 2 



Jmax] . (47) 



(log2(e)p) ^ (log2(e)p)' 

(c) Let Pc be row-permuted such that {Yl!kLi'P Ahk], i = 1, ■ ■ ■ ,Nr} is arranged in 
decreasing order. Further, if the entries of Pc satisfy 

^M^n^ax ^] < Pcb,Jm..] all 1 < z < iV, - 1, (48) 



then Vcanlp) > Vkron(p) as p 0. 

Proof: See Appendix O ■ 
The condition in (l48l) implies that the fraction of power captured in the beamforming direction 
by a receiver decreases in the same order as the total power captured by the receivers. For 
example, in the case of regular channels (see Footnote [3]), it is easy to check that (|48l) holds 
trivially. In fact, for regular channels, it can be checked that 

Kan(p) _ iVrE£l(Pc[^,Jmax])' 



2 



> 1 (49) 



Hron(p) (s;^Nr p r- ■ 

due to the Cauchy-Schwarz inequality. It also seems like the condition in (|48] ) is necessary to 
ensure that Kan(p) > Vkron(p)- For example, it can be checked that Kan(p) < Vkron(p) in the 
following 2x2 case where 



1 A{l + e) 
1 A 



A < , e > (50) 
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and ft8l) does not hold. Nevertheless, in the large-system regime, we have the following con- 
clusions for the probabilistic sparse frameworks I and II, introduced in Sec. IIV-BI 
Proposition 2: First, recall that I is a special case of II with p = 1. 

• (a) The probability with which the condition in (l48l) holds converges to 1 as {Nt, Nr} oo. 
Hence, Vcan (p) > Vkron (p) for "almost air0 sparse scattering environments generated from 
either framework. 

• (b) In particular, if < m < qij < M with E[qij] = /i and Var(gij) = cr^, we have 



1 < 



Kan(p) / 1 {M + mf 



< 



Kron(p) P ^Mm 



(51) 



p \ M 



More specifically, we have ^""^^^•'^ 
(c) Equality in the lower bound of ([5T]) is achieved when H is i.i.d. If Nt = Nr = N such 



that 



MN 
M+m 



and 



mN 
M+m 



are integers, equality in the upper bound is approached as ^ oo by 



pT_ N 

" M + m 



M + m m 



m M 



M 



M ■■■ 

M ■■■ 

■■■ m 











m 



(52) 



Proof: See Appendix iDl ■ 
Note that the channel corresponding to Pc in (l52l) is such that has at least (l — ^) 
dominant eigenvalues whereas the eigenvalues of are all equal to M + m. It is surprising that 
channels that are 'near-well-conditioned' on both the transmitter and the receiver sides (Hiid and 
the channel in (|52|) ) could either maximize or minimize Yr"'^l'\ depending on the distribution of 
non-zero entries in Pj,. 

Discussion: The above results show that the ergodic capacities remain the same under the 
canonical and the Kronecker models for all channels in the low-SNR regime. Thus, the dominant 
factors in understanding outage capacity (rate vs. reliability trade-off) in (l38l) are the variances of 
capacity. Since Kan(p) > Vkron(p) for almost all sparse channels, the outage capacity under the 
Kronecker model is always steeper than the outage capacity under the canonical model (except 



'^Technically, this statement has to be read as: "with probability 1 on the probability space corresponding to {pi,j}- 
Henceforth, we will not bother with this detail. 
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for i.i.d. He where they are equally steep). Furthermore, the differential in steepness increases 
as the channel becomes more sparse. 

In other words, at high levels of operational reliability, the Kronecker model overestimates 
capacity while it switches roles and underestimates capacity at low levels of reliability. However, 
the smallness of the capacity values generally means that these trends are not prominent when 
we plot outage capacity in the low-SNR regime. For example. Figs. [2l|4] plot the cumulative 
distribution function (CDF) of capacity (at —10, 10 and 30 dB SNRs) for three 8x8 channels 
generatecj^ to portray: i) A typical sparse setting, ii) A setting with intermediate level of richness, 
and iii) A typical rich setting. The spatial power matrices are given by 
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(53) 



while for rich scattering, it is 



c, rich 
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(54) 



and P 

Note that ^ Pc, , = 64 for all the three channels and the ratio of the largest to the smallest 
transmit eigenvalue decreases from 100.4 to 9.31 and 5.45 as the channel becomes progressively 
richer. The ratio of the largest to the smallest receive eigenvalue decreases from 1682 to 36.6 
and 25.5 as the channel becomes richer. The channel realizations are generated as 



H. = Hiid (P. 



>l/2 



(55) 



'^^The spatial power matrices for this experiment have been generated artificially to mimic certain typical scattering 
environments, and not using the sparse frameworks of Sec. IIV-BI 
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Fig. 2. Capacity CDFs of a sparse cliannel with canonical and Kronecker models at —10, 10 and 30 dB SNRs. 



where Hjjd is an i.i.d. channel and (Pc, ,)^/^ is the element-wise square-root of the spatial power 
matrix. 

Spectral Efficiency: Another characterization of low-SNR performance is in the context of 
spectral efficiency [51] (equivalently, Cerg, ,{p) vs. p behavior). We now present the connections 
between the canonical and the Kronecker models to the two key figures-of-merit in low-SNR 
communications: i) Minimum energy per bit necessary for reliable communication, , and 

J'o mill 

ii) Wideband slope, 5*0. For a multi-antenna channel, these two metrics are given by [51] 

li^^ = E[TriUm^)y = ^ ■ i [Tr ((HQH^)^)] ^^^^ 

where the input covariance matrix, Q = diag(Q[i]), is low-SNR capacity-achieving and unit 
trace constrained. 

When there is only one dominant transmit eigen-mode, beamforming to this mode is spectral 
efficiency-optimal. If there are r dominant eigen-modes with r > 1, any Q that excites any 
of the r modes with any weightage is ergodic capacity-optimal. However, [51] points out that 
uniform-power signaling over these r modes is necessary to maximize spectral efficiency. We 
consider these two cases separately in the following theorem. 
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Fig. 3. Capacity CDFs of a channel that has an intermediate level of richness with canonical and Kronecker models at —10, 10 
and 30 dB SNRs. 



Theorem 3: If r = 1, the minimum energies per bit are given by 

^E^ log,(2) (g) log,(2) ^^^^ 

-^omin, can -^omin, kron -^cf^; Jmax] Nj-PfJ^ 

where the convergence in (a) is for the sparse framework II. An application of the Gaussian 
moment factoring theorem [52] with the optimal input shows that 

(y^jj Pc[^j Jmax]) 
Ei (Pc[«, jmax])^ + (E»Pc[«,Jmax]) 

(E»Pfc[^)imax]) ^ 
Ei (Pfc[^, Jmax])^ + (Ei Pfc[^' Jmax]) 

With framework II, we have 

^°'-"^(iV..p + l)/i2 + a2' ^°'^-"^iv^- ^^^^ 

If r > 1, the energies per bit are the same as in (|57] ). The wideband slopes generalize to 

2Nrrfi'^p 2Nrr 

So, can Of AT TTTT T\ \ I 2' "JO, kron ^ — ; • (61) 



Q o V/--/2 c[ ;JmaxJy /'CQ\ 

•JQ, can — ^ ■ ,-^:rT- — '■ TT2 — : 7~2^ V^o^ 



C _ o feL''5jmaxj; 

>J0, kron — " ^ — ; — : T;^- P^J 
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Proof: With r = 1, the conclusion about energy per bit is straightforward. The expression 
for the wideband slope follows immediately from the fact proved in Appendix O 

iVrPV ^ Yl (Pfc[*.imax])' < (Pc Jmax] ) ' ^ N^p + a^) . (62) 

i i 

For r > 1, see Appendix |El ■ 
It can be checked that Sq^ can < •S'o, kron in either case. However, this conclusion is not easily 
reflected in Figs. [2l|4]due to two reasons: 

• ^ , which is the same for both the channel models (in both r = 1 and r > 1 cases), is 

"o min 

the most important figure of merit at low-SNR and corresponds to first order variation in 
ergodic capacity with SNR while 5*0 corresponds to second order variation at low-SNR. 

• The discrepancy in Sq for the two models is small. In fact, we have 



Ei (^(Pc[«, jmax]) - (Pfc[«, jmax])' 



CJ^ / 1 



I 'S'o, kron — 'S'o, can I < ^ ^. , ^^^2 ~ ^772 ' ^ ( ~ ) ^^^^ 



(E.Pc[^,Jmax])' Pl^' \N, 



for the r = 1 case, and 



cr2 f N r 

|^0,..on-^0,can|<^-O^^^^^-^) (64) 
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for the r > 1 case. In the second case, the difference in wideband slopes is 0[^) for 
finite values of r. 

B. High-SHR Extreme 

We now make two assumptions on the random matrix channel He to ai(|l^ in capacity analysis: 
1) Nt = Nj. = N, and 2) rank(Hc) = a.s. Note that the second condition is equivalent to 
assuming that none of {X]iPc[^, j]} and {X]jPc[^, j]} are zero. From the discussion following 
Prop. [B we also have rank(Hfe) = a.s.. 

In this setting, the capacity random variables under the two models are given by 

Ccan(p,H)4log2det(l^ + ^H,Hf) ^=hog2det(H,Hf)+iVlog2(^)+oQ^ (65) 

Ckron(p, H) ^ log2 det (l^ + ^H,Hf ) log, det (H^Hf ) + iVlogs (^) + O (66) 

where in (a) we have used both Assumptions 1) and 2), and in (b), we have used the fact that 
rank (Hfe) = = rank (He) a.s. Hence the statistics of C^anip, H) and Ckron(P7 H) at high-SNR 
are related to the moments of log2 det (HcH^) and log2 det (likHk)^ respectively. We now 
perform a large-system analysis of these random log-determinants. 

Stochastic Approximation for the Canonical Case: In the case of Hwd (Pc[i,j] = 1 for all 
this analysis is simplified by what is known as the Bartlett decomposition (or bidiagonaliza- 
tion) of a sample covariance matrix [44], [53], [54]. The decomposition states that there exist 
independent random variables Zj on some probability space such that 

N N 

Z ^ det (Hiid H^,) ~ n ~ 5Z |Hiid[^, ~ 2 ^^^^ - ^ + 1)) (67) 

i=l j=i 

where x^(2/c) is a central chi-squared random variable with 2k degrees of freedom. 

On the other hand, computing log2 det (HcH^) in closed-form is extremely difficult because 
{Pc[i,j]}, in general, possess no structure and a Bartlett-type decomposition for det (H^H^) 
is not known. Nevertheless, a tight stochastic approximation for Cerg, can(p) is still possible and 
for this, we need the following notation from [55]. 

We say that a random variable X2 upper bounds a random variable Xi (and denote it by 
Xi < X2) if 

Pr (Xi < x) > Pr (X2 < x) for all x G M. (68) 

'"^The first condition can be relaxed with some advanced random matrix theory techniques that are out-of-scope here. If this 
is done and we obtain results for arbitrary Nt and Nr, then the second condition can be assumed without any loss in generality 
since we can always ignore those columns/rows with zero power. Nevertheless, for simplicity of analysis, we assume both 
conditions. 
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The following lemma provides a statistical "bound" and a useful stochastic approximation for 
det(HeHf). 

Lemma 5 (Girko): Let H[z,j] be independent and distributed as CJ\f{0,Pij). Then, 

N N 

Z-J]mmp,,, < det(HH^) < Z-J]maxp„- (69) 

1=1 i=l 

where Z is as in (|67l) . Moreover, there exist independent random variables Zj, i = 1 ■ ■ ■ N on 
some probability space such that det (HH^) can be well-approximated as 




N 



Fig. 5. Comparison of means of (as a function of A') for a typical scattering environment and averaged over many scattering 
environments. 

Numerical studies indicate that the approximation in Lemma [5] is close for a large class of 
random matrices even for small values of N. Furthermore, this approximation gets more accurate 
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as increases for a large class of random matrices. This fact is illustrated in Fig. [5] where we 
plot E[Zi] and E[Z2] as a function of matrix dimension N with 

^ fi w 7iP\ 

Zi = log2det(HH^) and = J^og, f ^^^^LLi^j . (71) 

The first set corresponds to a typical scattering environment where {pij} are chosen i.i.d. from 
a uniform distribution on [0, 1] (in particular, /i = | and = j^). The second set corresponds 
to a smoothed version of the first set where we also average over many different scattering 
environments. Here, we have averaged over 5000 independent scattering environments and the 
plot shows that the approximation is very accurate on average. In the rest of the paper, we 
assume that the approximation in (TTOl) is accurate. Nevertheless, its rigorous use is contingent 
on further studies that have to establish its preciseness. This will be the subject of future work. 
Capacity Computation and Comparison: 

Theorem 4: With the sparse frameworks of Sec. IIV-BI good estimates can be obtained for 
ergodic capacity in the high-SNR extreme. 

• (a) The ergodic capacity under the Kronecker model converges to 



^erg, kron (p) 

N 



Ky^on = El^g^l^^l^^^^) (73) 

"tt V 22kiPk,i J 
whereas under the canonical model, it is well-approximated (with the approximation ap- 
proaching an equality as iV — >^ oo following the previous discussion) by 

Cerg,can(p) ^ iVlog^ ^ + log^ + i^can + O (74) 

• (b) In the large-system regime, the following expressions are true: 

^ / \ / \ 1 / -^^row pow ' -^^col pow \ /rj/s 
Cerg, can (p) - Cerg, kron (p) ^ — logg "^TT 1 (76) 

where AM, and GM, correspond to the arithmetic and geometric means of row and column 
powers of Pc. Further, we also have 

< Ce.g,can(p) " Cerg, kron (p) < 2N\og,iN). (77) 
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• (c) Equality in the lower bound holds if and only if He is regular. While it seems difficult 
to construct a Pc that meets the upper bound, the following choice is order-optimal and 

TV — >oo 

resuhs in C^rg, can (p) - C'erg, kron (p) ~ N loga ( A^) : 

Pe = diag[Ar2-A^ + l, 1, ■■■ (78) 

N-l 

Proof: See Appendix O ■ 
Variance of Capacity: Closed-form results are difficult to obtain for V,(p) as p ^ oo. How- 
ever, numerical studies indicate that for most scattering environments A/Kan(p) and A/Vkron(p) 
are sub-dominan|^ when compared with Cerg, can(p) and Cerg, kron(p), respectively. Thus, for a 
typical scattering environment, the outage capacities are primarily determined by Cgrg, can(p) 
and Cerg. kron (p)- The smoothing effect of the Kronecker model as can be seen from (ITTI) . the 
low-SNR trends of V,{p), and numerical studies (see Figs. [2l|4]) lend credence to the following 
conjecture proving which will be the subject of future work. 

Conjecture 1: The following are true for a large class of channels in the medium- to high-SNR 
regime: 

VKan(p) Af^oo vVkron(p) Af^oo J ^ ^ ^ T/ f \ 

' 0' 7^^ TT 0, and Kan(p) > Hron(p)- (79) 



Cerg, can(p) ^erg, kron(p) 

Discussion: From (l76l) . we first note that the mismatch accrued by the Kronecker model increases 
as the ratio of arithmetic and geometric means of the row and the column powers increases. 
The ratio of arithmetic and geometric means is a measure of the homogeneity of the vector 
(under consideration) or lack of disparities [56], [57]: The regularity of Pc in our context. That 
is, the smaller the ratio, the more regular the channel and vice versa. Thus, we see that the more 
non-regular the channel, the larger the mismatch with the Kronecker model. This conclusion 
is reflected in the structure of the choices of Pc (in Theorem S]) that lead to a large and a 
small mismatch. It is also reflected in Figs. [21111 where the channels become more regular as 
they become richer (This is because both the transmit and the receive sides become more well- 
conditioned as the channel becomes richer), and the mismatch between the Kronecker and the 
canonical models decreases. 

We also note the following trends. In the case of non-regular channels, the fact that Cerg, can (p) > 
Cerg, kron (p) and the sub-dominance conjecture of V,(p) implies that the Kronecker model un- 
derestimates capacity confirming the observations made in recent measurement campaigns [18], 
[20]-[25]. Note that the SNR range of most of these observations lie between 10 and 20 dB, 
which can be viewed as the high-SNR regime. The choice of the SNR range also explains why 
the popular belief on the decreasing probability of overestimation (see e.g., [26, Footnote 5]) has 

For example, in the i.i.d. case, it can be seen th3,t Cerg, can (p) — C'erg, kron (p) = 0{N) while Kan(p) = Kron(p) = 

0(log(iV)) [54]. 
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come about. The case of regular (or near-regular) channels in the high-SNR regime has a behavior 
similar to that of channels in the low-SNR regime. Finally, note that the theory developed in this 
work is useful in the context of the probabilistic sparse framework of Sec. IIV-BI where it holds 
with probability 1. Since the class of sparse channels forms the most predominant class in the 
space of all possible channels (Fig. [U, the utility of this theory is immense. 

VII. Conclusion 

In this paper, we have unified existing statistical models for spatially correlated multi-antenna 
channels by considering a canonical decomposition of the channel along the transmit and/or the 
receive eigen-bases. This framework generalizes the Kronecker model, the virtual representation 
and the Weichselberger model, and as a by-product develops two other classes of statistical 
models. In addition, we have developed an abstract framework to model spatial sparsity that has 
been observed in many recent measurement campaigns. 

These campaigns have also demonstrated that the Kronecker model results in misleading 
estimates for the capacity of realistic scattering environments. However, the reasons for these 
observations have not been well-understood so far. In this work, we have rigorously established 
the connection between spatial sparsity of the true channel, the non-regularity of the sparsity 
structure, and the impact they have on the capacity estimates provided by a Kronecker model 
fit. The Kronecker model fit uses the marginal sum statistics and this spreads the sparse DoF 
in the spatial domain. The consequent redistribution of the channel power is responsible for the 
mismatch in capacity estimation. In particular, we have shown that in the case of non-regular 
channels, the Kronecker model underestimates capacity in the medium- to high-SNR regime. 
On the other hand, in the low-SNR regime and regular channels in the high-SNR regime, the 
Kronecker model overestimates capacity at high levels of operational reliability and vice versa. 

Possible extensions to this work include the development of a more systematic framework 
for the generation of correlated/sparse multi-antenna channels, the impact sparsity has on the 
over/underestimation of capacity and reliability, establishing rigorously the approximation in 
Lemma [5] and Conjecture [B computation of closed- form expressions for the mean and the vari- 
ance of capacity under the canonical and the Kronecker models at general SNRs, understanding 
the impact on capacity of different channel power normalizations that are consistent with physical 
intuition etc. 



Appendix 

A. Proof of Lemma |4] 

Consider the matrix He = U^HUf. From Assumptions 1 and 2, we can write He = Uf^H^. 
Then, the cross-covariance of the columns of He (denoted by {hei}) satisfies 

E [Ki hg] = Uf E [\iu hg] U, = for all j, (80) 
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which follows from the column uncorrelatedness of in Lemma[2l Similarly, from Assumptions 
3 and 4, we can write He = HrVt- Then, the cross-covariance of the rows of He (denoted by 
{gci}) satisfies 

E [ge, gj] = Uf E [g„ g,^] Vt = for all i ^ j, (81) 

which follows from the row uncorrelatedness of H,. in Lemma [3l Thus, the columns and the 
rows of He are uncorrelated. This necessarily implies that all entries of He are uncorrelated. ■ 



B. Proof of Theorem [7] 

Preliminaries: The following result concerning the tail probabilities of weighted sums of i.i.d. 
random variables would lead us towards the estimation of /(p). 

Lemma 6 (Lanzinger and Stadtmueller, [58]): Consider i.i.d. random variables X, Xi, X2, ■ ■ ■ 
with E[X] = 0, E[X^] = a^. Let /? > and z/ > 2 with ElX"] < 00. Define the weighted sum 



= ^ tkXk, h > 0, and cr^ = (Tq ^ t 



k=l 



k=l 



Also, suppose for some a > 1, 



max tfc 

l<fc<n 



< for all n. 

In 



Then, lim e 

£^0+ 



^^.(/3+i/2)-2 Pr(|T„| >enV) 



E[\Ur 



v/2-1 



(82) 



(83) 



(84) 



n=l 



j.(/5+l/2)-l 

where A/" is a standard Gaussian random variable. ■ 
Note that the conclusion of Lemma [6] can be suitably modified in the case of e ^ 0+, but is 
sufficiently small, by increasing the right-hand side of (|84l ) appropriately. The crucial point is 
that this would not alter our conclusion since the above modification can be done, by keeping 
the right-hand side in (|84l ) still finite. 

Application of Lemma^ Lemma |6] is applied in our setting as follows. Let Xj = |Hijd[«, jmax]|^~ 
= Pe[«, jmax], = Ya=i ( | Hc[i, jmax] | ^ " Pc[«, jmax]), /? = |, = 2 and n = A^r- Then 
Lemma [6] implies that 

IT 



> T] 



n 



n 



1 

< — < oo 



(85) 



for T] appropriately small. The conclusion in (l85l) implies that there exists m > 1/2 and > 
such that 



Nr 



Pr 



> 7] 



i=l 



^ iV,^ (Pe[z,J^ax]) J < 



(86) 
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Proof of Theorem: Let Y = Y{Nr) denote the random variable YldZi |Hc[z, jmax] Setting 
• ^ ■ i with 7 = 1 + ■ ^'^ry^'t'^^r ^ and using m, we get 



Pr pY > 



1 — 7] 



Nr 



Pr pY-p5^P,[z 



i=l 



1 — T] 



Nr 

P^Pc[i, jmax] 
i=l 



(87) 



< Pr p|Y - E[Y]\ > TIP 



2^ 1 
^ A^r^ (Pc[i, jmax]) 1 < ^2£jY2m " 



An upper bound to /(p) follows easily from the log-inequality: 

/(p) ^ E[log2(l + pY)]<plog2(e)E[Y] = plog2(e)J]P,[^,w] 

i 

We now establish a tight lower bound for I{p): 

pY 



E [log2 (1 + pY]] W ^ 



log2(e) 



E 



l + pY_ 
.1 + pY 



1 — 1] 



+ E 



1 + pY 



1 — T] J 



Zi 



Z2 



> E 



(pY-p^Y^) x[pY<^ 



(88) 

(89) 
(90) 

(91) 



^3 



E [pY] - E 



pY x{pY> 



1 — T] 



E 



p'Y^ x[pY< 



1—7] 



^5 



where (a) follows from the inequality logg(l + z) > and (b) follows from using the inequality 
> {1 — z). An application of the Cauchy-Schwarz inequality shows that 

' JEm E[Y'] 



E[log2(l + pY)] 
log2(e) 



> P-^Pc[i,jr, 



E[Y] 



(92) 



Ze 



Now, the quantity Zq makes meaningful sense as a lower bound to /(p) only if it is positive. 
Plugging in the expression for p, we see that this can be ensurecQ ifrj is constrained to (0, 1/2]. 
Further, r] = 1/2 maximizes the lower bound to /(p). Evaluating £'[Y^], substituting the value 
of Kc and noting that p 



J r ■ — T for (5 < where 70 is as in (l40l) . we get 



/(.),.og,(e)...(l-?;|^-^^). 



(93) 



"The choice of 1/2 for the upper bound of the vahd interval of rj is more or less arbitrary and we have not optimized over 
this choice. 
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To complete the proof, we observe from [47] that beamforming is the optimal signaling strategy 
for all p < Plow, can and there exists a constant (i > (independent of Pc, Nr and Nt) such that 

~ V^"- P U 7 ]' ^^^^ 

The constant c in the statement of the theorem can be chosen to be mm{d, l/7o)- It is important 
to note that the tightness of the upper bound in (l88l) and the lower bound in (f89l ) critically hinge 
on the low-SNR assumption. Thus the theorem is complete. ■ 



C. Proof of Theorem |2] 

• (a) From (l35l) . the ergodic capacities in the low-SNR regime are given by 



Cerg, can(/5) = E 



erg, kron 



E 



log2 ( 1 + |Hc[i, 
log2 ^1 |Hfc[i, 



(95) 



(96) 



where j^ax = argmax^ E- Pc[«, j] = argmax^ E- Pfc[z, j]. Estimating these quantities is a 
straightforward consequence of Theorem [T] 
(b) For the variance, we have 

\2l /mn ^1 , ^r\^\2 



Kan(p) = i?[(l0g2(l+pY))V(^[l0g2(l + PY)])^ 



where Y = XlSi |Hc[i, jm^ 

Kan(p) < {\0g^{e)f-5^ 



Proceeding along similar lines as in App. |Bl we have 

2N 



1 - 



2 yj'^^c ^ 



( 



2 X2 



Kan(p) > (l0g2(e))^-5 



- 1 



7o 



25 ■ E[Y=^ 



(97) 

(98) 
\ 



v 



where the constants are as in the statement of Theorem [TJ Thus, we can recast Kan(p) as 



Kan(p) = (l0g2(e))' ■P'-Y. (Pc[^' W])' • (1 + 



(99) 



where the o(l) factor in the above expression converges to as oo and p ^ 0. The 

critical assumption in the above proof is that Hc[i,j] are independent random variables. 
Thus, the same proof technique can be adapted to compute Vkron(p) as well, 
(c) The relationship between Vcan(p) and Vkron(p) is not obvious. For this, we need the 
following result on the monotonicity of ratios of means [55, pp. 129-130]. 
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Lemma 7 (Marshall, Olkin and Proschan): Let x = [xi, ■ • ■ , x„] and y = [yi, ■■■,?/„] be 
two vectors such that Y^^=i = Y^l=i Hi- T' decreasing in i and a;i > ■ ■ ■ > a;„ > 0, 
then X is majorized by y, and 



1/r 



is decreasing in r for r > 0. ■ 
Application of Lemma ^ We set 

P'= Z^fc=lPcF>JmaxJ 

and n = Nr. From the assumption in the statement of the theorem, note that Xj > for all 
i and are in decreasing order. The fact that ^ is decreasing is a consequence of (|48l) . A 
straightforward consequence of Lemma |7] is that Kan(p) > Vkron(p)- ■ 



D. Proof of Prop. \2\ 

• (a) With framework II, the main goal is to compute the probability of failure of (|48|) . Towards 
this computation, we first condition upon and Pi+ij^^^ (in particular, s, and q,) where 

i is such that 1 < i < A^,. — 1. Define the conditional probability pf. 



max max / 

\Nr-l, 



(103) 



Hence, the conditional probability of failure of (l48l) is 1 — HiJi" ^ We intend to 
show that the above probability converges to as A^ ^ oo. 

Without loss in generality, we can assume that si^j^^^ = Si+i^j^^^ = 1 (Otherwise, pi = 0.). 
Similarly, we can assume that {qi^j,-^^^} is decreasing in i. Note that pi can be written as 

p. = Pr > 0^ where (104) 

Zatj = gj+ij^,, X qi,kSi,k - gjjmax X] (li+i,kSi+i,k. (105) 

max max 

Using the independence of {qi,k] and {si^k} and their statistics, it can be checked that 



(?i + ljmax - PP < 0, (106) 



Var(— ^ .0. (107) 
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That is, -j^ hardens around its mean (which is negative) as A^^ ^ oo and hence, pi converges 
to 0. Averaging over we see that for "almost all" sparse scattering environments, 

the condition in (|48l) holds and hence, Kan(p) > Vkron(p)- 

(b) We now compare the dominant terms of the variances of capacity with the two models. 
We have 



Nr (l0g2(e)p)' Nr ( ZuPK 

NtNr 



(108) 



After using (|27|) . we can also write Vkron (p) in terms of {pij} as 

Jmax 

Fkron(p) V J ' Nr \ Nt 



Nr (log2(e)p) f T,kiPk, 

NtNr 



(109) 



In the large-system regime, since {pij = qijSij} and {qij} is a realization from an i.i.d. 
family of mean /x and variance cr^, we can use the law of large numbers [59] to check that 

Kan(p) , E[{q,,)'] Vkron (P) ^ ^ 

Nr {log,{e)pf ^ {Eh,]fp' Nr {log,{e)pf ^ 
The fact that Vcan(p) > Vkron(p) follows from the Cauchy-Schwarz inequality. The upper 
bound for Yr""'^l'\ follows from the reverse Cauchy-Schwarz inequality [60, equation 24, p. 
208] due to Cassels, which is stated here for convenience. 

Lemma 8: If x = [xi, ■ ■ ■ , y = [yi, ■ ■ ■ ,yn] and w = [wi, ■ ■ ■ , Wn] are positive n- 
tuples such that < mi < a;^ < Mi and < m2 < yi < M2 for all i with mim2 < M1M2, 
then 

(Er=i^.^^D-(Er=il/>D ^ {m,m2 + M,M2f 
{Y.t^x,y,wlf - 4mim2MiM2 

■ 

(c) Equality in the lower bound is possible if and only if g^ j is constant with probability 1 
and p = 1. That is. He and are i.i.d. With Pc as in (|52l) . it can be checked that 

Kan(p) _{M + mf 1 + ^ ^^00 (M + mf 



\ 2mN ) 

Thus the proposition is complete. 



(112) 



E. Proof of Theorem |2] 

When r > 1, we assume that the r dominant columns have been relabeled as columns 1 
through r. We then have 

E, rlog,(2) log,(2) 



-^omin, can min, kron ^jJl ^^6=1 Pc[^; j] SjJl ^ c[h JmaxJ 



(113) 
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with the last equality following because all the r columns have the same sums. Using the uniform 
input over r modes and the Gaussian moment factoring theorem, the wideband slopes can be 
checked to be 

(e£iE;=iPc[^,j])' 

^0,can = 2 ^ L ^, (114) 

e£i e;=i e;=i Pc[^, ji]Pc[^, + e;=i (eSi Pc[^, j] 



(EtiE;=iP4^,j^ 

'S'o, kron — 2- —. (115) 

E£i Eii=i YJn=i Pfc[^, Ji]Pfc[«, J2] + E5=i (ESi Pfe[^, j]^ 



Using the law of large numbers appropriately, we have 

c 2iV,,r/i^p 2iV,r 

fj,^{NrP + (r — l)p + 1) + cr^ A/^ + r 



Proof of Lemma |5] 

See [44, Chap. 2, p. 104] for a proof of the first statement. For the statement on determinant 
approximation, we start with [61, p. 35, 39] which states that det (HH^) can be decomposed 
as a product of independent random variables, where 



(117) 



i N 

i=l 1=1 

and the matrix = {^^[i, j]} is a unitary random matrix independent of H. Note that 

|r/,,f = ^|h[^,/]|' \e[l,jf+Y,^[hmW0[h,3]9[h,3Y (118) 

and using the facts that the entries of a random unitary matrix are asymptotically self-averaging, 
(that is, zero mean in a "statistical" sense) and the rows and columns have unit norm, we have 
the following approximation for \7]ij\ : 

2 



(119) 



This leads to the approximation for Zj, which we denote by Zj in the statement of lemma. 
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G. Proof of Theorem |?] 

• (a) The i.i.d. result can be exploited in the Kronecker case as follows: 

log2det(HfcHf) log2det(A,HiidAtH^d) 

log2 det(At A,) + log2 det {U^ H?,) 



(120) 
(121) 



log2 ( 2 ■ P'^f^] ■ (2(iV - ^ + 1)) ) (122) 
i=i ^ ^ 



where (a) follows from the definition of H^, (b) from the fact that Nt = and det(AB) = 
det(BA), and (c) from ([67]) and the definitions of At and A^. Using E [log2 det(HiidHi^)] [54], 
we can compute C^rg, kron (p) to be 



a 



erg, kron 



Ariog2 



where 



kron 



N 
i=l 



N 



i=l 



N 



+ i^kron + O 



For the canonical case, we write HcH^ as 



HcH, 



H 



Ar2 



HH^, H[z,j]~CAr(0,p,,_ 



Ylij Pi,j 

and compute Cerg, can(p) as follows: 



(123) 



(124) 



(125) 



E 



log2 det(HH 



(a) 



N 



N 



^l0g2« + 5^i? 



N 



i=l 
N 



log2 



Ef=ilH[.,j]p 



^log2(i) + ^log. 



i=l 



f ^3=1 Pid 
V N 



(126) 
(127) 



<^erg,can(p) ~ loga 



'-can 



i=l 



where (a) follows from the approximation (the approximation gets more accurate as — > 
oo) in Lemma [5l The convergence in (b) follows from Prop. [3] which is stated and proved 
next. Since Nr = Nt = N, all the above steps are true even if He is replaced with H^. 
This leads to the expression for K^an in (1751) . 
Proposition 3: With the setting as above, we have 



log2 



Ef=i|H[z,j]p 



N 



Af— >oo 



log2 



l^j=l PiJ \ . 



N 



in mean for any 



(128) 
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Proof: We decompose the left-hand side as 



E 



E 



E 



Ef=i|H[^,j] 
N 



log 



log 



TV 



Ef=i|H[i,j]Px(|H[i,j]P<i^) 
N 



+ E 



Ef=l|H[^,J]px(|H[^,J]p>ir 

log I 1 + — )— 

E7=i|H[i,i]|^x(|H[i,i]P<ir 



(129) 



for some K > Q fixed. 

For the first term, note that the weak law of large numbers states that for all i 

E7=i m.j\?X (|H[^, J]P < k) E7=i£^ [|H[z, j]px (|H[z, j]p < k) 



N 



N 



= P. 



EliPij - {Pi,j + K)e 



N 



4130) 



(131) 



The convergence is in probability and hence, also weakly [62, p. 310]. The second equality 
follows from a routine expectation computation. Since log(-) is a continuous function and 
the limit random variable is a constant, following [62, p. 316, p. 310] we also have 



'Ef=i|H[i,j]Px(|H[MlP<x)\ 
log I — 1- I ^ log(P) . 



N 



(132) 



The above convergence can further be strengthened to convergence in mean since the random 
variables are bounded by K for all i and all choices of N [62, p. 310]. 
For the second term, we use the following lower bound: 

mi,j]\\ (|H[i, <k)> |H[i,j]|'x {e < |H[i,i]|^ < k) (133) 
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for some < e < K . Using this, we can upper bound the second term by 



E 



< E 



log 1 + 



log 1 + 



Ef=i|H[z,j]|2x (e<|H[2,j]|2<i^; 

Ef=i|H[z,j]Px(|H[z,j]p>i^ 
We 



< 



Ne 



Ne 

K 



(134) 



(135) 



(136) 



where the second step follows from the log-inequality. Combining these two results by 
choosing K sufficiently large to ensure that (pj^ + K)e ^'-^ is sufficiently small for all i, j 
and e finite, we obtain the conclusion as in the statement of the proposition. ■ 
(b) In the large-system regime, we have 

N 

Cerg, can(p) — C'erg, kron (p) ~ K^an — -ft'kron = ^ log 



i=i \N^/^iPi,iJ2kPk, 
( \ 

logs 



(137) 



N 



N 



( -^-^row DOW ' AM(-ol pow 



row pow 

row pow ' GrMcol pow 



(138) 



(139) 



where Pi = J2j^=i'^c[j,i] and Qi = J2j-'=i'^c[hj] the column and the row powers, 
respectively such that ^ . Pi = ^iQi = N'^- 

An application of the arithmetic-geometric mean inequality shows that A'can > K^^ron■ For an 
upper bound on the difference, we use the reverse arithmetic-geometric mean inequality [60, 
Theorem 3, p. 124] due to Docev, which is stated here for convenience. 
Lemma 9: If x = [xi, ■ ■ ■ , x„] is a positive n-tuple with K = spi^i^ then 



AMx (a:-i)a:— 

- elog(i^) 



(140) 



Since Pc is rank-A^, we apply Lemma |9] with AT = A^^ — A^ + 1 for an upper bound, and 
the resuh is (1771) . 

(c) Equality in the application of the arithmetic-geometric mean inequality is possible if 
and only if Pi = Qi = N for all i. It is straightforward to check that a channel satisfying 
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this property has to be necessarily regular (see Footnote [3]). The conclusion for the lower 
bound follows by plugging the choice of Pc in (fTSl) in the capacity expressions. 
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